Practical Collapsed Stochastic Variational Inference for the HDP
نویسنده
چکیده
Recent advances have made it feasible to apply the stochastic variational paradigm to a collapsed representation of latent Dirichlet allocation (LDA). While the stochastic variational paradigm has successfully been applied to an uncollapsed representation of the hierarchical Dirichlet process (HDP), no attempts to apply this type of inference in a collapsed setting of non-parametric topic modeling have been put forward so far. In this paper we explore such a collapsed stochastic variational Bayes inference for the HDP. The proposed online algorithm is easy to implement and accounts for the inference of hyper-parameters. First experiments show a promising improvement in predictive performance. 1 Background We begin by considering a model where each document d is a mixture θd of K discrete topicdistributions φk over a vocabulary of V terms. Let zdi ∈ {1, ..,K} denote the topic of the i word wdi ∈ {1, .., V } in document d ∈ {1, .., D} and place Dirichlet priors on the parameters θd, φk. We have zdi | θd ∼ Discrete(θd) , θd ∼ Dirichlet(απ) , wdi | zdi, {φk} ∼ Discrete(φzdi) , φk ∼ Dirichlet(β) , where π is the top-level distribution over topics, and α and β are concentration parameters. While the dimensionality of K is fixed in latent Dirichlet allocation (LDA), we want the model to determine the number of topics needed. Consequently we follow the assumptions made by the hierarchical Dirichlet process (HDP) [1] of a countable but infinite number of topics, of which only a finite number is used in the posterior. Our prior π is constructed by a truncated sick-breaking process [2],
منابع مشابه
Stochastic Variational Inference for the HDP-HMM
We derive a variational inference algorithm for the HDP-HMM based on the two-level stick breaking construction. This construction has previously been applied to the hierarchical Dirichlet processes (HDP) for mixed membership models, allowing for efficient handling of the coupled weight parameters. However, the same algorithm is not directly applicable to HDP-based infinite hidden Markov models ...
متن کاملStochastic Variational Inference for HMMs, HSMMs, and Nonparametric Extensions
Hierarchical Bayesian time series models can be applied to complex data in many domains, including data arising from behavior and motion [32, 33], home energy consumption [60], physiological signals [69], single-molecule biophysics [71], brain-machine interfaces [54], and natural language and text [44, 70]. However, for many of these applications there are very large and growing datasets, and s...
متن کاملCollapsed Variational Bayesian Inference for PCFGs
This paper presents a collapsed variational Bayesian inference algorithm for PCFGs that has the advantages of two dominant Bayesian training algorithms for PCFGs, namely variational Bayesian inference and Markov chain Monte Carlo. In three kinds of experiments, we illustrate that our algorithm achieves close performance to the Hastings sampling algorithm while using an order of magnitude less t...
متن کاملCollapsed Variational Inference for HDP
A wide variety of Dirichlet-multinomial ‘topic’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of is...
متن کاملThe Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN generalizes the hierarchical Dirichlet process (HDP) to model correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables and study its statistical...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1312.0412 شماره
صفحات -
تاریخ انتشار 2013